Insight into telomere regulation: road to discovery and intervention in plasma drug-protein targets

Background Telomere length is a critical metric linked to aging, health, and disease. Currently, the exploration of target proteins related to telomere length is usually limited to the context of aging and specific diseases, which limits the discovery of more relevant drug targets. This study integrated large-scale plasma cis-pQTLs data and telomere length GWAS datasets. We used Mendelian randomization(MR) to identify drug target proteins for telomere length, providing essential clues for future precision therapy and targeted drug development. Methods Using plasma cis-pQTLs data from a previous GWAS study (3,606 Pqtls associated with 2,656 proteins) and a GWAS dataset of telomere length (sample size: 472,174; GWAS ID: ieu-b-4879) from UK Biobank, using MR, external validation, and reverse causality testing, we identified essential drug target proteins for telomere length. We also performed co-localization, Phenome-wide association studies and enrichment analysis, protein-protein interaction network construction, search for existing intervening drugs, and potential drug/compound prediction for these critical targets to strengthen and expand our findings. Results After Bonferron correction (p < 0.05/734), RPN1 (OR: 0.96; 95%CI: (0.95, 0.97)), GDI2 (OR: 0.94; 95%CI: (0.92, 0.96)), NT5C (OR: 0.97; 95%CI: (0.95, 0.98)) had a significant negative causal association with telomere length; TYRO3 (OR: 1.11; 95%CI: (1.09, 1.15)) had a significant positive causal association with telomere length. GDI2 shared the same genetic variants with telomere length (coloc.abf-PPH 4 > 0.8). Conclusion Genetically determined plasma RPN1, GDI2, NT5C, and TYRO3 have significant causal effects on telomere length and can potentially be drug targets. Further exploration of the role and mechanism of these proteins/genes in regulating telomere length is needed. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-024-10116-5.


INTRODUCTION 2
Background Explain the scientific background and rationale for the reported study.What is the exposure?Is a potential causal relationship between exposure and outcome plausible?Justify why MR is a helpful method to address the study question  MR employs genetic variation as an instrumental variable and allows the assessment of the role of protein targets on specific disease or health parameters (15).
In the primary analysis, we used "TwoSampleMR" (https://github.com/MRCIEU/TwoSampleMR) to assess the causal effect of plasma cis-Pqtls on telomere length.If a plasma cis-Pqtls corresponded to a single SNP, the causal effect was assessed using the Wald ratio method; conversely, if plasma cis-Pqtls corresponded to multiple SNPs, the causal effect was assessed using the inverse variance weighted MR method.
b) Describe how genetic variants were handled in the analyses and, if applicable, how their weights were selected MR employs genetic variation as an instrumental variable and allows the assessment of the role of protein targets on specific disease or health parameters (15).
In the primary analysis, we used "TwoSampleMR" (https://github.com/MRCIEU/TwoSampleMR) to assess the causal effect of plasma cis-Pqtls on telomere length.If a plasma cis-Pqtls corresponded to a single SNP, the causal effect was assessed using the Wald ratio method; conversely, if plasma cis-Pqtls corresponded to multiple SNPs, the causal effect was assessed using the inverse variance weighted MR method.
In the reverse MR analysis, we used five methods to assess causal effects: MR-IVW, MR-Egger, weighted median, simple mode, and weighted mode.We used the Bonferroni correction to effectively control for the false positive rate due to multiple comparisons (p = 0.05/734). 7

Assessment of assumptions
Describe any methods or prior knowledge used to assess the assumptions or justify their validity 6 .We performed co-localization analyses of previously identified key plasma cis-Pqtls and telomere length.In the reverse MR analysis, we used five methods to assess causal effects: MR-IVW, MR-Egger, weighted median, simple mode, and weighted mode.The increased risk ratio (OR) for telomere length measures the degree of change in risk faced per standard deviation (SD) unit increase in plasma protein levels.
c) If the data sources include meta-analyses of previous studies, provide the assessments of heterogeneity across these studies 5 Further, In addition, we screened plasma for cis-Pqtls.For the extraction of cis-Pqtls for primary analysis, the methodology was consistent with Zheng et al., which involved significant associations (p ≤ 5 x 10 -8 ), removal of SNPs and proteins within the human Major Histocompatibility Complex (MHC) region, linkage disequilibrium (LD) aggregation (r 2 < 0.001), pleiotropy, and consistency testing, and cis-pQTL screening within ±500 kb (20).
d) For two-sample MR: i. Provide justification of the similarity of the genetic variant-exposure associations between the exposure and outcome samples ii.Provide information on the number of individuals who overlap between the exposure and outcome studies 16 In the primary analysis section, all cis-Pqtls were correlated with only 1 SNP, preventing us from performing heterogeneity and pleiotropy analyses of overall causal effects, which may have limited in-depth understanding of multifactorial effects.

11
Main results a) Report the associations between genetic variant and exposure, and between genetic variant and outcome, preferably on an interpretable scale 9 In the primary analysis, we identified 11 drug target proteins with significant causal associations with telomere length (p < 0.05/734).According to Wald ratio analysis, nine plasma proteins, APOA5, SERPINF1, RPN1, LCT, TYMP, PSMB1, GDI2, GSTO1, and NT5C, had negative causal associations with telomere length, and two plasma proteins, KDELC2, TYRO3, had positive causal associations with telomere length (Figure 2) (Table 1).In external validation, five plasma proteins, GDI2, GSTO1, NT5C, RPN1, and TYRO3, remained significantly causally associated with telomere length (p < 0.05/11) (Supplementary Table 2).In the reverse causality assay, there was a reverse causal effect of telomere length on GSTO1 (p = 0.013) (Supplementary Table 3).As a result of these analyses, we identified GDI2, NT5C, RPN1, and TYRO3 as the four essential drug target proteins for telomere length.
b) Report MR estimates of the relationship between exposure and outcome, and the measures of uncertainty from the MR analysis, on an interpretable scale, such as odds ratio or relative risk per SD difference 5 The increased risk ratio (OR) for telomere length measures the degree of change in risk faced per standard deviation (SD) unit increase in plasma protein levels.The increased risk ratio (OR) for telomere length measures the degree of change in risk faced per standard deviation (SD) unit increase in plasma protein levels.12 Assessment of assumptions a) Report the assessment of the validity of the assumptions 16 In the primary analysis section, all cis-Pqtls were correlated with only 1 SNP, preventing us from performing heterogeneity and pleiotropy analyses of overall causal effects, which may have limited in-depth understanding of multifactorial effects.
b) Report any additional statistics (e.g., assessments of heterogeneity across genetic variants, such as I 2 , Q statistic or E-value) NA F statistics 13 Sensitivity analyses and additional analyses a) Report any sensitivity analyses to assess the robustness of the main results to violations of the assumptions 16 In the primary analysis section, all cis-Pqtls were correlated with only 1 SNP, preventing us from performing heterogeneity and pleiotropy analyses of overall causal effects, which may have limited in-depth understanding of multifactorial effects.
b) Report results from other sensitivity analyses or additional analyses NA NA c) Report any assessment of direction of causal relationship (e.g., bidirectional MR) 9 In the reverse causality assay, there was a reverse causal effect of telomere length on GSTO1 (p = 0.013) (Supplementary Table 3).To this end, using large-scale plasma cis-Pqtl data and telomere length GWAS datasets, we executed this MR study.Using two-sample MR with external validation and reverse causality testing, we established GDI2, NT5C, RPN1, and TYRO3 as essential proteins for telomere length.Our work marks a pioneering effort in elucidating the role of four plasma proteins, GDI2, NT5C, RPN1, and TYRO3, in regulating telomere length, providing a new perspective and essential information for the field.These drug targets are promising for addressing cancer and age-related diseases and promoting personalized medicine, as they may modulate telomere length.

2 - 3
Researchers have recently uncovered the link between specific gene-or proteinrelated telomere length regulatory effects and certain diseases.A Mendelian randomization(MR) study found that a putative target of simulated metformin, GPD1-induced reduction in HbA 1c, was positively associated with longer leukocyte telomere lengths (11).EGFR enhances telomerase activity (affecting telomere lengths) by potentiating the transcription of TERT, which correlates with the differentiation grade and prognosis of nonsmall-cell lung cancer (12).The Shelterin Complex consists of six proteins: TRF1, TRF2, POT1, RAP1, TIN2 and TPP1.Its dysfunction or defects may shorten telomeres and promote aging (13).Upregulation of ETS transcription factors involved in the reactivation of telomerase can diminish the efficacy of BRAF inhibitors in patients with BRAF-mutant pediatric gliomas (14).In the future, an important research direction is to develop specific drug targets to intervene in telomere length, leading to the treatment of diseases and the slowing down of aging.However, current explorations of drug targets related to telomere length are usually limited to the context of specific diseases and aging, which limits the discovery of additional drug targets, as researchers prefer to focus on targets that are directly related to specific biological processes.

4- 5 Further,
In addition, we screened plasma for cis-Pqtls.For the extraction of cis-Pqtls for primary analysis, the methodology was consistent with Zheng et al., which involved significant associations (p ≤ 5 x 10 -8 ), removal of SNPs and proteins within the human Major Histocompatibility Complex (MHC) region, linkage disequilibrium (LD) aggregation (r 2 < 0.001), pleiotropy, and consistency testing, and cis-pQTL screening within ±500 kb (20).Regarding the extraction of cis-Pqtls used for external validation, we adopted the following approach: 1) retain Single nucleotide polymorphisms(SNPs) that were statistically tested to be highly correlated (p < 5 x 10 -8 ); 2) SNPs with minor allele frequency between 0.01 and 0.99; 3) SNPs adjacent to gene transcription start sites, covering upstream and downstream of the genes 1MB each; 4) SNPs highly correlated with each other in genetic LD were excluded (r 2 < 0.001).c) Describe the MR estimator (e.g.two-stage least squares, Wald ratio) and related statistics.Detail the included covariates and, in case of two-sample MR, whether the same covariate set was used for adjustment in the two samples 5, 6

d)
Explain how missing data were addressed NA NA e) If applicable, indicate how multiple testing was addressed 5

c)
If relevant, consider translating estimates of relative risk into absolute risk for a meaningful time period 5 Abbreviations in the graph: ln = natural logarithm; PVE = proportion of variance explained.
d) When relevant, report and compare with estimates from non-MR analyses NA NA e) Consider additional plots to visualize results (e.g., leave-one-out analyses) NA NA DISCUSSION 14 Key results Summarize key results with reference to study objectives 12 of the study results (a) to other populations, (b) across other exposure periods/timings, and (c) across other levels of exposure 17 Finally, all cis-pQTLs data and telomere length GWAS dataset for the design of this study were derived from European Describe the study design and the underlying population, if possible.Describe the setting, locations, and relevant dates, including periods of recruitment, exposure, follow-up, and data collection, when available.
population: European) was derived from a GWAS study by Codd et al. (22).Codd et al.METHODS 4 Study design and data sources Present key elements of the study design early in the article.Consider including a table listing sources of data for all phases of the study.For each data source contributing to the analysis, describe the following: a) Setting: by Ferkingstad et al. involving 35,559 Icelanders, which identified 18,084 Pqtls that were associated with protein levels in plasma (21).The GWAS dataset for telomere length (sample size: 472,174, population: European) was derived from a GWAS study by Codd et al. (22).Codd et al. d) For each exposure, outcome, and other relevant variables, describe methods of assessment and diagnostic criteria for diseases 4 The plasma Pqtls data used for the primary analysis were obtained from an MR study by Zheng et al. that validated 3606 Pqtls associated with 2656 proteins (20).These data were obtained from five previous GWAS.The plasma Pqtls data used for external validation came from a genomewide association study (GWAS) performed by Ferkingstad et al. involving 35,559 Icelanders, which identified 18,084 Pqtls that were associated with protein levels in plasma (21).The GWAS dataset for telomere length (sample size: 472,174, population: European) was derived from a GWAS study by Codd et al. (22).Codd et al.
15Limitations Discuss limitations of the study, taking into account the validity of the IV assumptions, other sources of potential bias, and imprecision.Discuss both direction and magnitude of any potential bias and any efforts to address them 16In the primary analysis section, all cis-Pqtls were correlated with only 1 SNP, preventing us from performing heterogeneity and pleiotropy analyses of overall causal effects, which may have limited in-depth understanding of multifactorial effects.
c) Clinical relevance: Discuss whether the results have clinical or public policy relevance, and to what extent they inform effect sizes of possible interventions 17